AITopics | visual odometry

Collaborating Authors

visual odometry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

89fcd07f20b6785b92134bd6c1d0fa42-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 18:26:14 GMT

exp, jacobian, optimization, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.31)
Information Technology > Artificial Intelligence > Robots (0.31)

Add feedback

MARVO: Marine-Adaptive Radiance-aware Visual Odometry

Sundar, Sacchin, Kikani, Atman, Alam, Aaliya, Shrote, Sumukh, Khan, A. Nayeemulla, Shahina, A.

arXiv.org Artificial IntelligenceDec-1-2025

Underwater visual localization remains challenging due to wavelength-dependent attenuation, poor texture, and non-Gaussian sensor noise. We introduce MARVO, a physics-aware, learning-integrated odometry framework that fuses underwater image formation modeling, differentiable matching, and reinforcement-learning optimization. At the front-end, we extend transformer-based feature matcher with a Physics Aware Radiance Adapter that compensates for color channel attenuation and contrast loss, yielding geometrically consistent feature correspondences under turbidity. These semi dense matches are combined with inertial and pressure measurements inside a factor-graph backend, where we formulate a keyframe-based visual-inertial-barometric estimator using GTSAM library. Each keyframe introduces (i) Pre-integrated IMU motion factors, (ii) MARVO-derived visual pose factors, and (iii) barometric depth priors, giving a full-state MAP estimate in real time. Lastly, we introduce a Reinforcement-Learningbased Pose-Graph Optimizer that refines global trajectories beyond local minima of classical least-squares solvers by learning optimal retraction actions on SE(2).

artificial intelligence, machine learning, optimization, (18 more...)

arXiv.org Artificial Intelligence

2511.2286

Country:

North America > United States (0.68)
Asia > India > Tamil Nadu (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Metric, inertially aligned monocular state estimation via kinetodynamic priors

Liu, Jiaxin, Li, Min, Xu, Wanting, Li, Liang, Yang, Jiaqi, Kneip, Laurent

arXiv.org Artificial IntelligenceNov-26-2025

Accurate state estimation for flexible robotic systems poses significant challenges, particular for platforms with dynamically deforming structures that invalidate rigid-body assumptions. This paper tackles this problem and allows to extend existing rigid-body pose estimation methods to non-rigid systems. Our approach hinges on two core assumptions: first, the elastic properties are captured by an injective deformation-force model, efficiently learned via a Multi-Layer Perceptron; second, we solve the platform's inherently smooth motion using continuous-time B-spline kinematic models. By continuously applying Newton's Second Law, our method establishes a physical link between visually-derived trajectory acceleration and predicted deformation-induced acceleration. We demonstrate that our approach not only enables robust and accurate pose estimation on non-rigid platforms, but that the properly modeled platform physics instigate inertial sensing properties. We demonstrate this feasibility on a simple spring-camera system, and show how it robustly resolves the typically ill-posed problem of metric scale and gravity recovery in monocular visual odometry.

artificial intelligence, estimation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.20496

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Bias-Eliminated PnP for Stereo Visual Odometry: Provably Consistent and Large-Scale Localization

Zeng, Guangyang, Shen, Yuan, Hong, Ziyang, Hong, Yuze, Ila, Viorela, Shi, Guodong, Wu, Junfeng

arXiv.org Artificial IntelligenceOct-30-2025

--In this paper, we first present a bias-eliminated weighted (Bias-Eli-W) perspective-n-point (PnP) estimator for stereo visual odometry (VO) with provable consistency. Specifically, leveraging statistical theory, we develop an asymptotically unbiased and n-consistent PnP estimator that accounts for varying 3D triangulation uncertainties, ensuring that the relative pose estimate converges to the ground truth as the number of features increases. Next, on the stereo VO pipeline side, we propose a framework that continuously triangulates contemporary features for tracking new frames, effectively decoupling temporal dependencies between pose and 3D point errors. We integrate the Bias-Eli-W PnP estimator into the proposed stereo VO pipeline, creating a synergistic effect that enhances the suppression of pose estimation errors. Experimental results demonstrate that our method: 1) achieves significant improvements in both relative pose error and absolute trajectory error in large-scale environments; 2) provides reliable localization under erratic and unpredictable robot motions. The successful implementation of the Bias-Eli-W PnP in stereo VO indicates the importance of information screening in robotic estimation tasks with high-uncertainty measurements, shedding light on diverse applications where PnP is a key ingredient. Index T erms --Stereo visual odometry, PnP pose estimation, large-scale localization, consistent estimator . ISUAL odometry (VO) refers to estimating the pose of a moving camera in a 3D space from sequential images captured by the camera. The significance of VO stems from its advantages of being infrastructure-free, cost-effective, lightweight, energy-efficient, etc [1, 2, 3]. It enables robots to perceive and navigate their environment autonomously. Compared with monocular VO, stereo VO offers several advantages, such as scale consistency, better accuracy, and enhanced robustness, due to its ability to perceive depth directly [4, 5]. Existing VO methods typically optimize both camera poses and 3D map points simultaneously, with the map being used to track new frames through the perspective-n-point (PnP) algorithm [1, 6, 4].

artificial intelligence, machine learning, odometry, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2025.3614050

2504.1741

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.55)

Add feedback

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots

Wei, Yufei, Lu, Wangtao, Lu, Sha, Hu, Chenxiao, Han, Fuzhang, Xiong, Rong, Wang, Yue

arXiv.org Artificial IntelligenceSep-19-2025

Abstract--Bird's-Eye-View (BEV) representation offers a metric-scaled planar workspace, facilitating the simplification of 6-DoF ego-motion to a more robust 3-DoF model for monocular visual odometry (MVO) in intelligent transportation systems. However, existing BEV methods suffer from sparse supervision signals and information loss during perspective-to-BEV projection. Our approach introduces: (1) dense BEV optical flow supervision constructed from 3-DoF pose ground truth for pixel-level guidance; (2) PV-BEV fusion that computes correlation volumes before projection to preserve 6-DoF motion cues while maintaining scale consistency. The framework employs three supervision levels derived solely from pose data: dense BEV flow, 5-DoF for the PV branch, and final 3-DoF output. Extensive evaluation on KITTI, NCL T, Oxford, and our newly collected ZJH-VO multi-scale dataset demonstrates state-of-the-art performance, achieving 40% improvement in RTE compared to previous BEV methods. The ZJH-VO dataset, covering diverse ground vehicle scenarios from underground parking to outdoor plazas, is publicly available to facilitate future research. IRD'S-EYE-VIEW (BEV) representation has become a cornerstone for perception and localization tasks in modern intelligent transportation systems [1]-[3], offering a powerful solution to the scale drift problem inherent in Monoc-ular Visual Odometry (MVO) [4], [5]. For ground vehicles like autonomous cars and logistics robots, motion is predominantly planar [6]. This allows for simplifying pose estimation from six degrees of freedom (6-DoF) to a more robust 3-DoF model (x,y, yaw), which naturally aligns with the unified, metric-scaled grid of BEV representation [7]. This simplification not only reduces computational complexity but also mitigates the accumulation of errors in non-primary motion axes, a common source of drift in long-range navigation.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.14636

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (0.68)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Visual Odometry for Stereo Event Cameras

Zhong, Sheng, Niu, Junkai, Zhou, Yi

arXiv.org Artificial IntelligenceSep-11-2025

Event-based cameras are bio-inspired sensors with pixels that independently and asynchronously respond to brightness changes at microsecond resolution, offering the potential to handle state estimation tasks involving motion blur and high dynamic range (HDR) illumination conditions. However, the versatility of event-based visual odometry (VO) relying on handcrafted data association (either direct or indirect methods) is still unreliable, especially in field robot applications under low-light HDR conditions, where the dynamic range can be enormous and the signal-to-noise ratio is spatially-and-temporally varying. Leveraging deep neural networks offers new possibilities for overcoming these challenges. In this paper, we propose a learning-based stereo event visual odometry. Building upon Deep Event Visual Odometry (DEVO), our system (called Stereo-DEVO) introduces a novel and efficient static-stereo association strategy for sparse depth estimation with almost no additional computational burden. By integrating it into a tightly coupled bundle adjustment (BA) optimization scheme, and benefiting from the recurrent network's ability to perform accurate optical flow estimation through voxel-based event representations to establish reliable patch associations, our system achieves high-precision pose estimation in metric scale. In contrast to the offline performance of DEVO, our system can process event data of \zs{Video Graphics Array} (VGA) resolution in real time. Extensive evaluations on multiple public real-world datasets and self-collected data justify our system's versatility, demonstrating superior performance compared to state-of-the-art event-based VO methods. More importantly, our system achieves stable pose estimation even in large-scale nighttime HDR scenarios.

artificial intelligence, event camera, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.08235

Country: Europe > Switzerland (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Good Deep Features to Track: Self-Supervised Feature Extraction and Tracking in Visual Odometry

Gottam, Sai Puneeth Reddy, Zhang, Haoming, Keras, Eivydas

arXiv.org Artificial IntelligenceSep-11-2025

Abstract--Visual-based localization has made significant progress, yet its performance often drops in large-scale, outdoor, and long-term settings due to factors like lighting changes, dynamic scenes, and low-texture areas. These challenges degrade feature extraction and tracking, which are critical for accurate motion estimation. While learning-based methods such as SuperPoint and SuperGlue show improved feature coverage and robustness, they still face generalization issues with out-of-distribution data. We address this by enhancing deep feature extraction and tracking through self-supervised learning with task-specific feedback. Our method promotes stable and informative features, improving generalization and reliability in challenging environments.

artificial intelligence, data mining, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2509.08333

Country: Europe > Germany (0.29)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Feature Extraction (0.85)

Add feedback

Odometry Calibration and Pose Estimation of a 4WIS4WID Mobile Wall Climbing Robot

Ćaran, Branimir, Milić, Vladimir, Švaco, Marko, Jerbić, Bojan

arXiv.org Artificial IntelligenceSep-5-2025

--This paper presents the design of a pose estimator for a four wheel independent steer four wheel independent drive (4WIS4WID) wall climbing mobile robot, based on the fusion of multimodal measurements, including wheel odometry, visual odometry, and an inertial measurement unit (IMU) data using Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF). The pose estimator is a critical component of wall climbing mobile robots, as their operational environment involves carrying precise measurement equipment and maintenance tools in construction, requiring information about pose on the building at the time of measurement. Due to the complex geometry and material properties of building fac ades, the use of traditional localization sensors such as laser, ultrasonic, or radar is often infeasible for wall-climbing robots. Moreover, GPS-based localization is generally unreliable in these environments because of signal degradation caused by reinforced concrete and electromagnetic interference. Consequently, robot odometry remains the primary source of velocity and position information, despite being susceptible to drift caused by both systematic and non-systematic errors. The calibrations of the robot's systematic parameters were conducted using nonlinear optimization and Levenberg-Marquardt methods as Newton-Gauss and gradient-based model fitting methods, while Genetic algorithm and Particle swarm were used as stochastic based methods for kinematic parameter calibration. Performance and results of the calibration methods and pose estimators were validated in detail with experiments on the experimental mobile wall climbing robot.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.04016

Country: Europe > Croatia (0.14)

Genre: Research Report (0.64)

Industry:

Materials > Construction Materials (0.68)
Materials > Chemicals > Commodity Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

89fcd07f20b6785b92134bd6c1d0fa42-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 18:11:03 GMT

artificial intelligence, conv3x3, resblock, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.32)
Information Technology > Artificial Intelligence > Robots (0.31)

Add feedback

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model

Azhari, Maulana Bisyir, Shim, David Hyunchul

arXiv.org Artificial IntelligenceJul-18-2025

Learning-based monocular visual odometry (VO) poses robustness, generalization, and efficiency challenges in robotics. Recent advances in visual foundation models, such as DINOv2, have improved robustness and generalization in various vision tasks, yet their integration in VO remains limited due to coarse feature granularity. In this paper, we present DINO-VO, a feature-based VO system leveraging DINOv2 visual foundation model for its sparse feature matching. To address the integration challenge, we propose a salient keypoints detector tailored to DINOv2's coarse features. Furthermore, we complement DINOv2's robust-semantic features with fine-grained geometric features, resulting in more localizable representations. Finally, a transformer-based matcher and differentiable pose estimation layer enable precise camera motion estimation by learning good matches. Against prior detector-descriptor networks like SuperPoint, DINO-VO demonstrates greater robustness in challenging environments. Furthermore, we show superior accuracy and generalization of the proposed feature descriptors against standalone DINOv2 coarse features. DINO-VO outperforms prior frame-to-frame VO methods on the TartanAir and KITTI datasets and is competitive on EuRoC dataset, while running efficiently at 72 FPS with less than 1GB of memory usage on a single GPU. Moreover, it performs competitively against Visual SLAM systems on outdoor driving scenarios, showcasing its generalization capabilities.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.13145

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback